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Abstract 

Literature addressing missing data handling for random coefficient models is particularly 
scant, and the few studies to date have focused on the fully conditional specification framework 
and “reverse random coefficient” imputation. Although it has not received much attention in the 
literature, a joint modeling strategy that uses random within-cluster covariance matrices to 
preserve cluster-specific associations (Yucel, 2011) is a promising alternative for random 
coefficient analyses. This study is apparently the first to directly compare these procedures. 
Analytic results suggest that both imputation procedures can introduce bias-inducing 
incompatibilities with a random coefficient analysis model. Problems with fully conditional 
specification result from an incorrect distributional assumption, whereas joint imputation uses an 
underparameterized model that assumes uncorrelated intercepts and slopes. Monte Carlo 
simulations suggest that biases from these issues are tolerable if the missing data rate is 10% or 
lower and the sample is comprised of at least 30 clusters with 15 observations per group. Further, 
fully conditional specification tends to be superior with intraclass correlations that are typical of 
cross-sectional data (e.g., ICC = .10), whereas the joint model is preferable with high values 


typical of longitudinal designs (e.g., ICC = .50). 
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A substantial body of methodological research supports missing data handling techniques 
that assume a missing at random (MAR) mechanism, whereby the probability of nonresponse on 
a given variable is unrelated to that variable’s scores after conditioning on observed data (Little 
& Rubin, 2002; Rubin, 1976). Maximum likelihood estimation and multiple imputation are the 
principal MAR-based methods that enjoy widespread use in the behavioral sciences. Relative to 
single-level analysis problems, the application of these approaches to multilevel missing data has 
received somewhat less attention in the literature. Existing research has largely focused on 
random intercept models (DiazOrdaz, Kenward, Gomes, & Grieve, 2016; Drechsler, 2015; 
Goldstein, Carpenter, Kenward, & Levin, 2009; Larson, 2011; Ltidke, Robitzsch, & Grund, 
2017; Mistler & Enders, 2017; Reiter, Raghunathan, & Kinney, 2006; Shin & Raudenbush, 
2007, 2010; Taljaard, Donner, & Klar, 2008; van Buuren, 2011; Yucel & Demirtas, 2010), and 
literature addressing random coefficient models is especially nascent (Enders, Keller, & Levy, 
2017; Grund, Liidke, & Robitzsch, 2016). 

Joint model imputation and fully conditional specification (also known in the literature as 
chained equations and sequential regression imputation) are the major multiple imputation 
frameworks for multilevel data (Carpenter, Goldstein, & Kenward, 2011; Enders et al., 2017; 
Enders, Mistler, & Keller, 2016; Goldstein, Bonnet, & Rocher, 2007; Goldstein et al., 2009; 
Schafer, 2001; Schafer & Yucel, 2002; van Buuren, 2011, 2012), and both are widely available 
in software packages. Consistent with their single-level counterparts, these procedures create 
several copies of the data, each containing different estimates of the missing values. Both employ 
an iterative algorithm that estimates multilevel model parameters and then samples missing 
values from a distribution that conditions on these quantities. Importantly, joint model 


imputation uses a multivariate multilevel regression model that features incomplete variables 
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regressed on complete variables, whereas fully conditional specification imputes variables one at 
a time from a sequence of univariate models with an incomplete variable regressed on complete 
and previously imputed variables. As explained later, univariate and multivariate imputation 
schemes offer very different approaches to random coefficient models. 

The few studies to examine missing data handling for random coefficient models have 
focused on the fully conditional specification imputation framework (Enders et al., 2017; Enders 
et al., 2016; Grund, Liidke, et al., 2016) popularized by van Buuren and colleagues (van Buuren, 
2011, 2012; van Buuren, Brand, Groothuis-Oudshoorn, & Rubin, 2006; van Buuren et al., 2014). 
In the context of a random slope analysis, fully conditional specification employs a “reverse 
random coefficient” approach (Grund, Liidke, et al., 2016) that treats y as a random predictor of 
an incomplete covariate, and vice versa. Simulation results suggest that fully conditional 
specification can produce negatively biased slope variance estimates (Enders et al., 2017; Grund, 
Liidke, et al., 2016), and analytic work shows that an incorrect distributional assumption is 
responsible for this problem (Enders, Du, & Keller, 2017, October). As an alternative, Yucel 
(2011) proposed a joint (multivariate) imputation strategy that uses random within-cluster 
covariance matrices to model cluster-specific associations. This promising approach was recently 
implemented in the R package jomo (Quartagno & Carpenter, 2016a), and the only study to 
examine its performance did so in the context of individual patient data meta-analysis 
(Quartagno & Carpenter, 2016b). As we show later, using within-cluster covariance matrices to 
model random coefficients is potentially problematic because the method cannot preserve 
covariance between the random intercepts and slopes, but the practical magnitude of this 


shortcoming is unclear. 
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Random coefficient models pose particularly challenging missing data problems because 
existing imputation frameworks can introduce bias. Promising new Bayesian imputation 
procedures are currently under development but are not yet widely available in software (Enders 
et al., 2017, October; Erler et al., 2016). Although fully conditional specification and joint 
modeling with random covariance matrices are both imperfect, they will not necessarily produce 
comparable results because their underlying models are quite different. As such, understanding 
the relative strengths and weaknesses of these strategies is important for selecting an appropriate 
imputation procedure. Because the methodology literature currently offers no guidance on this 
issue, the primary goal of our manuscript is to compare fully conditional specification and joint 
model imputation with random level-1 covariance matrices. We begin by describing each 
approach and discussing its compatibility with a random coefficient analysis, after which we use 
Monte Carlo computer simulations to evaluate parameter recovery in a wide range of conditions 
typical of applied research settings. The paper concludes with a real data analysis example. 

To keep the ensuing discussion as simple as possible, we focus on missing data handling 


for a two-level random slope model with a single predictor at each level. The analysis model is 


Vij = Bo + Bixij + Bow; + boj + bijxiz + ej (1) 


where yj is the outcome score for observation i in cluster /, x and w; are level-1 and level-2 
predictors, respectively, Bo is the intercept, and f, and f are the fixed effect slope coefficients 
for the two predictors. Consistent with the standard multilevel model formulation, we assume 


that the intercept and slope residuals, bo; and b,;, respectively, are multivariate normal with zero 
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means and an unstructured covariance matrix Z),. Similarly, the within-cluster residuals are 
normal with common variance a2. 

As an aside, readers may question our focus on multiple imputation, given that maximum 
likelihood is the default estimator in most multilevel software programs. Maximum likelihood is 
applicable to a wide range of single-level missing data problems, but it is arguably less flexible 
for multilevel missing data. Dedicated software programs that operate within the traditional 
mixed model framework allow incomplete outcomes (e.g., a longitudinal analysis where the 
number and configuration of measurements differs across respondents), but the absence of 
distributional assumptions for explanatory variables forces programs to exclude observations or 
clusters with missing predictor scores. The HLM program (Raudenbush, Bryk, & Congdon, 
2013) appears to be the only dedicated multilevel software package that can accommodate 
incomplete explanatory variables, but this functionality is limited to random intercept models 
with normally distributed predictors (Shin & Raudenbush, 2007, 2010; Shin & Raudenbush, 
2013). The multilevel structural equation modeling framework (Mehta & Neale, 2005; Rabe- 
Hesketh, Skondral, & Zheng, 2012; Stapleton, 2013) is a second option for fitting random 
coefficient models with missing data. Mp/us (Muthén & Muthén, 1998—2017) is the only 
package we know of that can accommodate random slope predictors with missing data, but no 
technical documentation is available describing this functionality. Maximum likelihood 
estimators for incomplete multilevel data tend to invoke complicated transformations of the 
population joint distribution (Shin & Raudenbush, 2007, 2010), making it difficult to understand 


and explain their performance without also knowing their technical underpinnings'. As such, we 


' We included the Mp/us implementation of maximum likelihood estimation in a subset of our 
simulations, and the method was uniformly biased. We do not report these results here because 
there is currently no technical documentation that helps us understand and explain the results. 
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focus on multiple imputation because it is arguably a more mature technology for incomplete 
multilevel data. 
Joint Model Imputation with Random Covariance Matrices 

The joint imputation framework is based on a multivariate regression model that is 
typically parameterized by regressing the set of incomplete variables on the set of complete 
variables, or by treating all variables as outcomes regardless of missing data pattern (Asparouhov 
& Muthén, 2010; Carpenter & Kenward, 2013; Goldstein et al., 2009; Schafer & Yucel, 2002; 
Yucel, 2011). We describe the latter parameterization, focusing on a variant of the joint model 
that introduces heterogeneous within-cluster covariance matrices (Yucel, 2011). This approach 
holds promise for random coefficient models but has not received much attention in the 
literature. Note that we alter our notation in this section (e.g., replacing fs with ys, etc.) to 
clearly differentiate the imputation and analysis models, and we use alphanumeric subscripts to 
associate parameters to specific variables (e.g., yi) and Y(x)). 

The joint imputation model for the random coefficient analysis defines each observation 


as the sum of a grand mean and one or more normally distributed deviation scores 


Vij =Vo) + Wig) F Evi~”) 
Xig = Vex) FU) + Eijox) 
Wj = Yu) + Uw) (2) 


(1 ~N(0,E,) Ce Y 4 ~N (0,5.,) 


Es; 
Uw) ee 
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where y;.) is the regression intercept or grand mean, uj.) is a level-2 deviation score capturing 
the difference between a latent cluster mean and the grand mean (e.g., Ujry) = Hjcy) — Vy)» and 
&j() 18 a within-cluster deviation between an observation and its corresponding latent group 
mean (€.g., €ij(y) = Vij — Hj(y))- Further, Z,, is the level-2 covariance matrix of the between- 
cluster deviation scores, and 2 &j is the within-cluster covariance matrix for cluster j, as follows. 
2 


uy) Oy (yx) Ou (yw) . 


, Fey Fe jyx) 
Yy =| Fey Fey Fuawy) | Be; = : 
Gi, o2 
2 j@y) I(x) 3 
Cuw,y) ieias Ou) ( ) 


E,~ IW (v +7,S+ S.,) 


where JW denotes an inverse Wishart probability distribution for cluster j’s covariance matrix, uv 
and S are the degrees of freedom and scale matrix (i.e., center and spread) of the hierarchically- 
specified Wishart distribution of all random covariance matrices, n; is the number of 


observations in cluster /, and S &j is the within-cluster residual sum of squares and cross-products 


matrix for cluster 7. Importantly, allowing the covariance matrices to vary across clusters differs 
from the classic joint model formulation (Schafer, 2001; Schafer & Yucel, 2002) and provides a 
mechanism for preserving random slope variation. 

The computational machinery for joint model imputation uses an iterative Markov chain 
Monte Carlo (MCMC) algorithm consisting of two major steps. At each iteration, the algorithm 
first uses Bayesian estimation to generate model parameters and level-2 deviation scores, 


conditional on the filled-in data from the previous cycle, after which it updates imputations, 
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conditional on the current estimates. The estimation sequence generates a coefficient vector, y, 


between-cluster covariance matrix, Z,,, level-2 deviation scores, u = (uy, Ux), Uw) )> and a set 
of J within-cluster covariance matrices, 2, = (z Mitinaie ai The distribution of the random 


covariance matrices across all clusters requires two additional parameters, the degrees of 
freedom and scale matrix, uv and S (i.e., parameters defining the distribution’s center and spread, 
respectively). Roughly speaking, v and S can be can be viewed as quantities that define a pooled 
level-1 covariance matrix. 

The Bayesian estimation sequence views model parameters as random variables, and it 
uses Monte Carlo computer simulation to “sample” plausible estimates from a series of 
probability distributions. Specifically, each iteration ¢ of the MCMC algorithm applies the 


following estimation steps to the filled-in data 


1. Sample a coefficient vector y from WV (y juce-D, ge) data) 
2. Sample the level-2 covariance matrix 3 from W(x, ae); data) 


3. Sample the scale matrix $ from JW (s | vv, roe, data) 
7 (4) 
4. Sample the scalar degrees of freedom 0 from f (v | ees data) 


5. Sample level-2 residuals w© from W (u [y, pee sor) data) 


6. Sample level-1 covariance matrices eh from JW (z e; | yO, $s, 1, a©, data) 


where WV and JW are normal and inverse Wishart distributions, respectively, and the dot accents 


denote synthetic estimates generated via Monte Carlo computer simulation (van Buuren, 2012). 
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The full conditional distributions of y, Z,,, and u are given in a variety of resources that describe 
the classic joint model formulation (Carpenter & Kenward, 2013; Goldstein et al., 2009; Schafer 
& Yucel, 2002; Yucel, 2011). The conditional distributions of v and S are based on aggregate 
information from £, (Yucel, 2011, p. 359), and the final sampling step draws random covariance 
matrices from a distribution that leverages this pooled information as well as cluster-specific 
variation (see Equation 3). Finally, we use the generic symbol f to denote the conditional 
distribution of v because this quantity does not follow a standard parametric form. In addition to 
Yucel (2011), interested readers can consult Carpenter and Kenward (2013) and Quartagno and 
Carpenter (2016b) for additional technical details, including a discussion of prior distributions. 
After estimating the parameters and random effects, a final MCMC step updates missing 
values by sampling imputations from a multivariate normal distribution, the center and spread of 


which depend on the multilevel model parameters and level-2 residual terms. 


x9 |~w (yO +a, 20) (5) 


Recall that w has no within-cluster variation, so each w, is simply the sum of a grand mean and 
between-cluster residual term (i.¢., Wj = Yq) + Uj). The updated version of the completed 


data set carries forward to the next iteration, where MCMC estimation steps provide new 


? Cluster-specific covariance matrices replace a common within-cluster matrix in the classic joint 
model formulation, but the probability distributions for y, Z,,, and u are otherwise the same. The 
joint model uses improper uniform prior distributions for the coefficients and level-2 residual 
terms. Both the level-1 and level-2 covariance matrices use Wishart prior distributions with 
identity scale matrices and degrees of freedom equal to the dimension of the respective matrices. 
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parameter estimates and residual terms for the next round of imputation. The desired number of 
imputed data sets is obtained by iterating the MCMC algorithm and culling data sets at specified 
intervals in the chain (Schafer, 1997; Schafer & Olsen, 1998). 
Compatibility of Imputation and Analysis Models 

A vital consideration with multiple imputation is whether (and how) the imputation 
model preserves features of the analysis; methodologists have described this issue as 
congeniality (Meng, 1994; Schafer, 2003) and more recently as compatibility (Bartlett, Seaman, 
White, & Carpenter, 2014; Carpenter & Kenward, 2013). Yucel (2011) explained that modeling 
heterogeneous covariance matrices is “mimicking the idea underlying random effects” (p. 352), 
but he did not specifically examine random coefficient models’. The covariance matrix 
formulation makes it difficult to see which parts of the imputation model are preserving 
important features of the random slope analysis, but we can informally explore compatibility by 


using estimates of &j and &,, to define within- and between-cluster regression parameters that 


are analogous to those of the analysis model. To be clear, we are not examining whether the 
random coefficient analysis and imputation model are formally compatible with the joint 
distribution of the population. Rather, we are deriving regression model parameters as functions 


of x &j and &,, in order to highlight major similarities and differences between the imputation and 


analysis models. Quartagno and Carpenter (2016b) provide a similar assessment in the context of 


meta-analysis models. 


> Quartagno and Carpenter (2016b) considered the application of random covariance matrices to 
various meta-analysis models. 
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To begin, consider the level-1 part of the imputation model from Equation 2. The X &j 


matrix contains variances and covariances of within-cluster deviation scores for cluster /, which 


we use to define a level-1 regression model with €;;(,) predicting &;;,y) 


Op. 
(vii -— Hin) = ae Hu — Hycxy) + ey = Maj (Huy — Hj) + ij 
I(x) 


(6) 


2 


2. 22 
Of, = Of... — 14.0, 
ej Ej) Vy 


&j(x) 

where 70, ; is the slope coefficient for cluster j, e;; 1s a within-cluster residual for observation i in 
cluster j, and oe, is the corresponding residual variance for that group. Equation 6 clearly shows 
that the random covariance matrices are responsible for preserving slope variation, as these 
parameters imply a unique regression coefficient for each group, 7, ;. Importantly, the level-1 
model has no mean structure or intercept that varies across level-2 units because the within- 
cluster deviation scores have an expectation of zero in all groups. To facilitate our subsequent 
comparison with the analysis model, we express the cluster-specific slopes as a function of an 
average coefficient and a between-cluster residual term, 71; = Y; + g1;. Substituting the right 


side of this expression into Equation 6 gives the following level-1 regression model. 
(Vij — Hyon) = Mais — Hic) + Gry aj — Hyon) + 4 (7) 
Next, the Z,, matrix contains variances and covariances of between-cluster deviation 


scores, which we can use to define a level-2 multiple regression model with uj.) and ujqy) 


predicting ujcy) 
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= 1. 


Fi Fucewy]  [PXcxy 
(Hin — Yor) = [Hie — Ye) ( — Yon) s 2 ee | + Jo; 
uwx) uw) ve 
Y 
= [Hjem -Yeo) -ron)l [y.| + 903 o) 


2 
ey a i 
Jo” ~Uy) 2 

3 Ori) Y3 


Obey) 
where y, and y3 are between-cluster slope coefficients, go; is a residual that captures 
unexplained variation in the cluster means, and Og: is the residual variance. 

Finally, combining the right and left sides of the within- and between-cluster regression 
equations and adding the grand mean of y to each side of the resulting expression gives a model 


with the same form as the random coefficient analysis. 


Vij = Vo) + M1 (ij — Hien) + ¥2(Hieo — Ye) + ¥3(Wj — Vw) 
(9) 
+ Joy + Gay (iy — Hye) + ey 


Comparing Equations | and 9 reveals three important differences between the imputation and 
analysis models, one of which is potentially problematic. First, Equation 9 includes an additional 
regression slope that partitions the relation between x and y into distinct within- and between- 
cluster components (i.e., y; and yz, respectively), as in the classic contextual effects model 
(Enders, 2013; Kreft, de Leeuw, & Aiken, 1995; Longford, 1989; Liidke, Marsh, Robitzsch, & 
Trautwein, 2011). Second, the imputation model accommodates a heterogeneous within-cluster 


residual variance structure (e.g., Oé, from Equation 6), whereas the analysis model assumes 
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constant variance. Incompatibilities such as these that arise from a rich imputation model with 
ancillary parameters are usually not problematic and may even be beneficial (Collins, Schafer, & 
Kam, 2001; Meng, 1994; Schafer, 2003). Although Equations 6 through 9 clearly confirm that 
random covariance matrices mimic random effects, they highlight an important caveat. 
Specifically, because X &j and the corresponding within-cluster regression model do not include a 
mean structure or intercept that varies across level-2 units, between-cluster variation in 7; (or 
equivalently, g,;) is necessarily orthogonal to functions of Z,,, notably the between-cluster 
residual term, go;. Consequently, an imputation model based on random covariance matrices 
could only be approximately compatible with a random coefficient analysis with uncorrelated 
intercepts and slopes (i.e., a block-diagonal XZ; ). This restriction could be detrimental to analyses 
where the level-2 covariance captures an interesting substantive phenomenon (e.g., growth 
models where initial status informs change rates). 
Fully Conditional Specification 

Whereas joint model approach uses a multivariate regression model to generate 
imputations, fully conditional specification cycles through incomplete variables one at a time, 
generating imputations from a sequence of univariate multilevel models that feature an 
incomplete variable regressed on all remaining variables, complete and previously imputed 
(Enders et al., 2017; Enders et al., 2016; van Buuren, 2011, 2012). Consistent with the previous 
section, we alter our notation (e.g., replacing Ps with ys, etc.) to clearly differentiate the 
imputation and analysis models, and we use alphanumeric subscripts to associate parameters to 
specific variables (e.g., Y(y) and yx). 

Returning to the random slope analysis from Equation 1, fully conditional specification 


applies the following imputation models 
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Vij = Yorr) + VQ) Xij + Va~yWj + V3— Xj + Uojeyy + Uj Xi + Ej) 


(10) 
Uoj(y) 2 
fase) ~N (0, Lucy) Eijqy~N (0, Oxy) 
Xig = Yow) + Vic Vij + V2ceyWj + VacxyVji + Uojcxy + Ur jx) Vij + Fife) 
(11) 
Uo j(x) 2 
Gs A) ~N(O2ue)) eijo~N (0, O20) 
Wj = Yow) + Yq Vj + Vac] + Uojcw) 
(12) 


Uo jw)~N (0, oxewy) 


where x; and y, are the level-2 cluster means computed from the imputed data for cluster j, Uo ;(.) 
and uy ;(.) are now between-cluster residual terms, and €;j(.) is a within-cluster residual with 
constant variance Ox. Notice that the round-robin imputation scheme features the outcome from 


one equation (the target of imputation) as a predictor in all other equations. Importantly, the 
models use a “reverse random coefficient” approach (Grund, Liidke, et al., 2016) preserve 
random slope variation, whereby x is a random predictor in the y imputation model, and vice 
versa. Also, note that latent rather than manifest variable group means can be used to preserve 
between-cluster variation among the incomplete variables (Grund, Ltidke, & Robitzsch, 2017). 
Analytic and simulation results suggest that manifest cluster means provide good performance in 
most situations but can introduce slight biases with low intraclass correlations, few observations 


per cluster, and extremely unbalanced cluster sizes (Grund et al., 2017). 
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Like joint model imputation, fully conditional specification employs an iterative MCMC 
algorithm that generates Bayesian estimates of model parameters and level-2 residual terms, 
conditional on the filled-in data, after which it updates the imputations, conditional on current 
estimates and residuals. Each iteration applies these estimation and imputation steps to the 
incomplete variables in a sequence. The model parameters for an incomplete level-1 variable g 


include a coefficient vector, y(q), level-2 residuals, ug), within-cluster residual variance 05(q)> 


and the level-2 residual covariance matrix Z whereas level-2 imputation requires only the 


u(q)? 
coefficients and residual variance (see Equation 12). Consistent with the joint model, Monte 


Carlo computer simulation samples plausible estimates from the series of probability 


distributions given below. 


1. Sample a coefficient vector Vp from NV (Yea) | ui Sane er Ca . , data) 


(t) (t) gz 1) w(t-1) 
2. Sample level-2 residuals Wa) from NV (uc) IViqy F e(a) Lu) , data) 


(13) 


3. Sample level-1 residual variance 6x *) from Tw (0 O5(q) | Vie we, data) 


4. Sample the level-2 covariance matrix a ) from JW (z u(q) | On: data) 


As before, the dot accents denote synthetic values generated via Monte Carlo simulation (van 
Buuren, 2012). A variety of resources provide the technical details about these distributions 


(Browne & Draper, 2000; Enders et al., 2017; van Buuren, 2012)4. 


4 As described in the technical appendix of Enders et al. (2017) improper uniform prior 
distributions are adopted for the coefficients and level-2 residual terms. The inverse Wishart 
distribution for the level-2 covariance matrices Z,,(q) implemented an identity prior with p + 1 
degrees of freedom, which is a common recommendation in the literature. Similarly, the inverse 
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After estimating the parameters and random effects for incomplete variable q, a final 
MCMC step updates missing values by sampling imputations from a univariate normal 
distribution, the center and spread of which depend on the model parameters. The imputation 


steps for our example are as follows. 


(t) . 5 «(t-1) : - (t-1) F = (t-1) . . -(t-1) .2 
vig ~ N (Yoo TYV10)*%ij  FV20)Wji FT Y3Q)%] FT Uojo) F Urjon*iy dq) (14) 


XP ~N(Vow + GOI? + Vaaawe? + PaeI + tojen + tayo? 62u)) 5) 


wi oN (Yow) 7 Vande” as Yawk”, gi) a 


To simplify notation, we omit the iteration superscript from the parameters and residual terms 
because these quantities always originate from iteration ¢. 
Compatibility of Imputation and Analysis Models 

Consistent with the joint model, the imputation models in Equations 10 and 11 are more 
flexible than necessary because they include a contextual effects-type specification that partitions 
the relation between x and y into unique within- and between-cluster components (Carpenter & 
Kenward, 2013; Enders et al., 2017; Enders et al., 2016)°. The primary problem with fully 
conditional specification is its use of reverse regression to preserve random slope variation (e.g., 


y is arandom predictor of x, and vice versa). As noted previously, simulation studies suggest that 


Wishart prior for the residual variance uses a scale parameter of unity with two degrees of 
freedom. 

> Carpenter and Kenward (2013, p. 220) attribute the idea of including cluster means as 
predictors to a personal communication from Ian White. 
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this strategy can produce negatively biased slope variance estimates (Enders et al., 2017; Enders 
et al., 2016; Grund, Liidke, et al., 2016), and an incorrect distributional assumption appears to be 
responsible for this problem (Enders et al., 2017, October). Specifically, when a random 
predictor is missing, assuming a normal distribution for the level-2 residuals in the analysis or y 
imputation model precludes the possibility that the corresponding residuals from the reverse 
regression are also normal. MCMC estimation violates this assumption when it samples level-2 
residuals for x’s imputation model in Equation 11. A related issue occurs in the single-level 
context with reverse regressions that include a product term (Bartlett et al., 2014; Gelman & 
Raghunathan, 2001). 

To provide additional insight into the incompatibility problem, we can express the 
random effects from the reverse regression as a function of the corresponding terms from the 


analysis model. To begin, we solve for x in the analysis model as follows. 


ee (Bo + Baw; + bo; + ei;) (17) 
2 By + by; 


3 
Next, the right side of this expression replaces x;; in the left side of the reverse random 
coefficient model in Equation 11. For simplicity, we omit the cluster means from the right side of 
the imputation model since they do not appear in the analysis. Finally, solving for the imputation 


model’s random effects gives the following expressions. 


=(Bo + boj) 
You) + Yoja) =—E ap (18) 
J 
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1 
Yiqx) FUtj@) = Bethy, 
J 


Pe ei 
YO By + by; 


Equation 18 shows that assuming normality for bo; and b,; precludes the possibility that uo jx), 
Uy j(x), and &;;(,) are normal because neither the inverse of a normal variate nor the ratio of two 
Gaussians are members of the normal distribution family. In fact, the above expressions can be 
quite skewed and kurtotic. Consequently, applying the MCMC estimation steps from Equation 
13 to the reverse random coefficient model violates a key distributional assumption that ends up 
impacting imputation quality. As an aside, this issue does not occur when x is complete, nor does 
it occur in random intercept models. 
Computer Simulation Study 

To date, the only study to examine imputation with random covariance matrices did so in 
the context of individual patient data meta-analysis (Quartagno & Carpenter, 2016b), and it did 
not examine fully conditional specification. Given the compatibility issues highlighted 
previously, it is important to understand the relative strengths and weaknesses of these two 
approaches, as they will not necessarily produce comparable results. To address this gap in the 
literature, we designed a Monte Carlo simulation study with two within-subjects factors (missing 
data handling method and cause of missingness) and five between-subjects factors: intraclass 
correlation (ICC = .10 and .50), residual correlation between the random intercepts and slopes (0 
and .50), number of clusters (J = 15, 30, and 100), within-cluster sample size (n; = 5, 15, 30, and 
50), and MAR missing data rate (10%, 20%, and 30%). We generated 2000 artificial data sets 


within each of the 144 between-subjects design cells, resulting in 288,000 replications. 
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Previous studies of fully conditional specification (Enders et al., 2017; Grund, Ltidke, et 
al., 2016) and recommendations from the literature guided our design choices. For example, the 
ICC values of .10 and .50 are common in published applications and are consistent with those 
from cross-sectional and repeated measures designs, respectively (Gulliford, Ukoumunne, & 
Chinn, 1999; Hedges & Hedberg, 2007; Murray & Blitstein, 2003; Spybrook et al., 2011). In 
choosing the level-1 and level-2 sample sizes, we attended to recommendations from the 
literature (Kreft & de Leeuw, 1998; Maas & Hox, 2005), but we also selected small values 
capable of introducing bias (McNeish & Stapleton, 2016). It is difficult to identify typical 
missing data rates because published manuscripts do not routinely report this information, so we 
selected values large enough to expose small-sample biases in variance estimates (Enders et al., 
2017). 

Data Generation 

The random coefficient analysis from Equation | served as the true population model for 
the simulations. The data generation procedure also included level-1 and level-2 auxiliary 
variables, a and a2, respectively. As described later, a; determined missingness probabilities for 
x in one of the simulations (y was the cause of missingness in the second), and a2 always served 
as the cause of missingness on w. To specify effect sizes on a convenient metric, we used the 
values in Table | to define the total correlations among the variables. We chose correlations of 
.30 to align with Cohen’s (1988) medium effect size benchmark, and we set the correlations 
between the analysis and auxiliary variables to .40 to ensure that bias would result if these 
additional variables are omitted from imputation (Collins et al., 2001). We also set the 
correlation between x and y to .40 so we could manipulate the cause of missingness (i.e., ai or y) 


without changing the strength of the MAR selection mechanism. 
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We solved for the population parameters using the following steps. First, we fixed the 
total variance of all variables to unity. The level-2 predictor w and the auxiliary variables had all 
variation assigned to a single level, whereas x and y had variation at both levels, with between- 
cluster variance set to either .10 or .50 (ICC = .10 and .50, respectively). Second, we computed 
the level-1 and level-2 covariance matrices, 2 and £@), by pre- and post-multiplying the total 
correlation matrix by a diagonal matrix containing the level-1 or level-2 standard deviations 
(e.g., in the ICC = .10 condition the level-1 and level-2 standard deviations of x (or y) were V.90 
and v.10, respectively). The total covariance matrix was the sum of the within- and between- 
cluster matrices (i.e., = Z“) + £)), Third, we used the following expressions to solve for the 


model parameters 


B= Loaizomw) 
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correlation between the intercepts and slopes. We somewhat arbitrarily set the slope variance 
equal to 40% of the between-cluster variance, reasoning that this parameter should be large 
enough to reveal important differences between the imputation methods. 

Applying the previous expressions produced the parameter values in Equation 20. We 
then used the following steps to generate the data: (a) sample the within- and between-cluster 
components of the predictor variables from a normal distribution with zero means and 


5 


covariance matrices of Z ae and caw 


respectively, (b) compute x as the sum of its within- and 
between-cluster deviation scores, (c) sample bo; and b,; from a normal distribution with zero 
means and a covariance matrix Zp, (d) sample e;; from a normal distribution with variance oF 


and (e) compute y;; as a weighted sum of x, w, bo;, b,;, and e;;, as in Equation 20. The R data 


generation script is available upon request from the first author. 


Vij = 5.00 + .395(x;;) + .057(w;) + Boj + Di jxij + ei; 
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As noted previously, either the level-1 auxiliary variable a; or the outcome y determined 
missing values on x, and a2 served the same role for w. In all cases, higher scores on the cause of 
missingness produced higher rates of missing data. Using a logistic regression model, we defined 


the level-1 missingness probability for each observation as 
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etotA1Zij 


ae Np ESP (21) 
where A and A, are intercept and slope coefficients, respectively, and z;; is the standardized 
version of a; or y. Using the latent variable formulation for logistic regression (Agresti, 2012; 
Johnson & Albert, 1999), we derived coefficients that produced the desired missing data rate 
while maintaining a .50 correlation between the cause of missingness and the underlying normal 
propensity for missing data. The slope coefficient was A, = 1.815, and the intercept coefficients 
for the 10%, 20%, and 30% missing data rates were approximately Ag = -3.22, -2.10, and -1.31. 
Substituting z into Equation 21 produced an N-row vector of level-1 missingness probabilities, 
and we subsequently created an N-row vector of missing data indicators (0 = observed, | = 
missing) by sampling binary values from a binomial distribution with success rate equal to each 
observation’s missingness probability. An identical procedure produced missing values on w, 
such that p; and z; (the standardized version of az) replaced p;; and Z;; in the logistic function. 
Because different auxiliary variables determined level-1 and level-2 missingness, this deletion 
process induced a general missing data pattern where x or w (or both) could be missing for a 
particular observation. 
Imputation, Estimation, and Outcomes 

We used the R package jomo (Quartagno & Carpenter, 2016a) for joint model imputation 
and the Blimp application (Enders et al., 2017) for fully conditional specification. To diagnose 
convergence of the MCMC algorithm, we used Blimp to generate potential scale reduction 
factors (Gelman & Rubin, 1992) for a single replication in every design cell. These diagnostic 


values generally reached acceptable levels (e.g., < 1.05) in fewer than 1500 iterations. Based on 
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these convergence diagnostics, we generated 20 imputations® (Graham, Olchowski, & Gilreath, 
2007) from MCMC chains with 50,000 iterations, such that the first data set was saved after the 
2500" iteration and the remaining data sets were saved every 2500 iterations thereafter (i.e., in 
Blimp, we set the burn-in and thinning intervals to 2500). We relied on the default prior 
distributions of each software package under the assumption that most researchers would do the 
same. The imputation models for both procedures included the three analysis variables and two 
auxiliary variables. We used Mplus 8 (Muthén & Muthén, 1998-2017) to fit the analysis model 
to the multiply imputed data sets, and we wrote a custom R program to pool the resulting 
estimates and standard errors. All computational tasks were executed on UCLA’s Shared 
Hoffman2 Cluster, and we used a Linux shell script to coordinate the simulation components. 
The simulation code is available upon request from the first author. 

We examined three outcomes, relative bias, mean squared error, and confidence interval 
coverage. Relative bias was computed as the difference between an average estimate and the true 
population value divided by the true value and multiplied by 100 (i.e., bias as a percentage of the 
true value). In the conditions with orthogonal random effects, we used the non-zero covariances 
from Equation 20 as the divisor of the relative bias expression to avoid dividing by zero. 
Published simulation studies often define relative bias less than 10% in absolute value as 
acceptable (Finch, West, & MacKinnon, 1997; Kaplan, 1988). Mean squared error (MSE) is the 
average squared difference between an estimate and its true population value. This outcome is an 
overall measure of accuracy that captures the sum of squared bias and sampling variance. To 


facilitate interpretation, we defined an MSE ratio as MSEyu / MSErcs, such that values greater 


® When considering the design of the study, we examined a subset of results with 5 and 100 
imputations. Because bias values were effectively the same in both situations, we adopted 
Graham et al.’s (2007) recommendation to use 20 imputations. 
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than unity favor fully conditional specification (1.e., the joint model had larger squared errors), 
whereas values less than one favor the joint model. Finally, 95% confidence interval coverage 
was computed as the proportion of estimates where the 95% symmetric confidence interval (i.e., 
the estimate plus or minus 1.96 standard error units) included the true parameter value. Values 
lower than the nominal 95% rate reflect Type I error inflation (e.g., a coverage value of 90% 
suggests a twofold increase in Type I errors), whereas values greater than 95% reflect 
conservative inference. Confidence interval coverage is an indicator of standard error quality 
when estimates are unbiased. We consider coverage for only the slope coefficients because the 
literature argues that symmetric confidence intervals are inappropriate for variance estimates 
(Maas & Hox, 2005; Snijders & Bosker, 2012). 
Results 

The results are organized as follows. We begin by discussing the scenario where auxiliary 
variables determine missingness and the random intercepts and slopes are uncorrelated in the 
population. This situation is a useful baseline because the random covariance matrix approach 
should be approximately compatible with the data-generating model. Next, we examine the same 
missingness mechanism with correlated random effects. As described previously, this correlation 
should be detrimental to the joint model because imputation uses fewer parameters than the data- 
generating process. We did not expect this correlation to meaningfully influence fully 
conditional specification. Finally, we discuss any differences that result from an MAR 
mechanism where the outcome variable causes missingness in x. In the interest of space, we use 
a limited set of figures to highlight the main findings, and we present the results for all 
combinations of conditions in the online supplemental material [included here for reviewers]. 


Missingness Due to Auxiliary Variables and Uncorrelated Random Effects 
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Figures 1 and 2 are trellis plots displaying relative bias for the ICC = .10 simulation with 
15 and 30 clusters, respectively. In the interest of space, we relegate the 100-cluster plot to the 
online supplemental material because it was comparable to Figure 2. The figures highlight a few 
general trends. Not surprisingly, increasing the number of clusters from 15 to 30 reduced bias, 
particularly for the 62 coefficient and the intercept variance, estimates that rely heavily on the 
number of level-2 units. Similarly, increasing the within-cluster sample size from 5 to 15 reduced 
bias, but further increasing the group sizes did not impact most parameters. With the exception 
of the slope variance, using 30 clusters with at least 15 observations per group produced 
imputation estimates that exhibited roughly the same bias as complete-data estimates, even with 
30% missing data. 

Figures 1 and 2 suggest that both imputation approaches had difficulty preserving random 
slope variation, although they errored in different directions and by different amounts. 
Specifically, the joint model overestimated slope variance, with bias decreasing to reasonable 
levels (e.g., less than 10% in absolute value) when the within-cluster sample size was 30 or 
larger. The reliance on the within-cluster sample size is not surprising given that joint model uses 
random level-1 covariance matrices to preserve random slope variation (see Equation 6). On the 
other hand, fully conditional specification consistently underestimated slope variance, and the 
magnitude of this bias did not decrease in larger samples. Rather, bias was largely a function of 
missingness, with values exceeding 10% in absolute value at a 20% missing data rate. 

The trellis plots in Figures 3 and 4 display relative bias for the ICC = .50 simulation with 
15 and 30 clusters, respectively. The major trends from the ICC = .10 simulations were also 
evident here, so we do not reiterate these results. The primary difference in Figures 3 and 4 is 


that the joint model slope estimates were negatively rather than positively biased. We suspect 
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that the prior distribution (an inverse Wishart with identity scale matrix) is responsible for this 
change, perhaps because its mass better approximates that of the likelihood function. As before, 
the bias decreased to reasonable levels with group sizes of 30 or larger. 

Mean squared error ratios provide additional information about the relative performance 
of the two imputation approaches. Recall that mean squared error captures the overall accuracy 
of an estimate, with ratios greater than unity favoring fully conditional specification (i.e., joint 
model estimates were further from their true values, on average). For brevity, we report average 
ratios across the fixed regression coefficients and the level-2 covariance matrix elements and do 
not consider ratios for individual parameters. However, the online supplemental material 
includes trellis plots of the mean squared error ratios for all conditions. Mean squared error ratios 
for the regression coefficients favored fully conditional specification, albeit slightly, with 
average ratio values of approximately 1.02. The ratios for the level-2 covariance matrix differed 
by intraclass correlation. Fully conditional specification was superior in the ICC = .10 
conditions, with average ratio values of approximately 1.10, whereas the joint model was 
superior in the ICC = .50 conditions, with average ratios of about .96. 

No additional figures are needed to convey the confidence interval coverage results. 
Collapsing across other design cells, the average coverage value for the joint model regression 
slopes was .94 (min = .88, max = .99), and the corresponding average for fully conditional 
specification was .93 (min = .89, max = .99). To further convey the results, we examined the 
percentage of design cells with coverage values between .925 and .975, an interval defining the 
so-called liberal criterion from Bradley (1978). These results varied by intraclass correlation but 
consistently favored joint model imputation. Specifically, in the ICC = .10 simulations, 


approximately 81.9% of the joint model coverage values fell in Bradley’s interval, as compared 
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to 75% for fully conditional specification. In the ICC = .50 conditions, the corresponding values 
were 75% and 59.7%. 
Missingness Due to Auxiliary Variables and Correlated Random Effects 

Next, we consider the results from the simulation conditions where the residual 
correlation between the random intercepts and slopes equaled .50. As described previously, a 
non-zero covariance should be detrimental to the joint model because imputation uses fewer 
parameters than the data-generating process. To illustrate these results, the trellis plot in Figure 5 
displays relative bias values from the ICC = .50 condition with 30 clusters. As expected, the joint 
model underestimated the level-2 covariance, although the magnitude of this bias was not as 
large as one might expect. In particular, bias was generally in a tolerable range (e.g., less than 
10% in absolute value) if the missing data rate was 20% or less. Consistent with the previous 
results, joint model imputation improved as the within-cluster sample size increased. Mean 
squared error ratios were largely similar to those from the simulation with uncorrelated random 
effects. For example, ratios for the level-2 covariance matrix elements again differed by 
intraclass correlation, with average ratio values of about 1.06 and .93 in the ICC = .10 and .50 
conditions, respectively. 
Missingness Due to the Outcome Variable 

To explore whether our simulation results can generalize to other MAR processes, we 
repeated the previous simulations, this time treating y as the cause of missingness on x. The 
level-2 auxiliary a2 again determined missingness for w, and the imputation routines always 
included both auxiliary variables. Manipulating the cause of missingness produced no 
meaningful changes to the results. For completeness, the online supplemental material includes a 


full set of trellis plots displaying relative bias, but no additional plots are needed to convey these 
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results here. Similarly, the mean squared error ratios and confidence interval coverage results 
were virtually identical to those from the previous simulation, so no further discussion of these 
results is warranted. 
Real Data Example 

In this section we use the R package jomo (Quartagno & Carpenter, 2016a) and the 
Blimp application (Enders et al., 2017) to illustrate joint model imputation and fully conditional 
specification, respectively. The Blimp application for macOS, Windows, and Linux is a free 
download available at www.appliedmissingdata.com/multilevel-imputation. The data set for the 
analysis mimics a cluster-randomized design from an educational study of math achievement 
where 50 schools with 25 students each are randomly assigned to a novel math curriculum 
condition (the intervention) or a standard curriculum condition (the control) (Montague, Krawec, 
Enders, & Dietz, 2014). The analysis model examines the influence of the intervention dummy 
code (0 = control, 1 = intervention), controlling for math self-efficacy and teacher experience in 


years. 


math; = Bo + By (mathse;;) + By (teachexp; ) + B3 (condition, ) 
(22) 
+ boj + b,;(mathse;;) + ei; 


The intervention dummy code is complete but all other analysis variables have missing values. 
For the purposes of the analysis example, we treat math self-efficacy (a 6-point rating scale) as 
continuous, but jomo and Blimp have facilities for imputing categorical variables (Carpenter & 


Kenward, 2013; Enders et al., 2017). 
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The online supplemental material gives the jomo and Blimp syntax files and R code for 
the analysis and pooling steps. The Blimp commands can also be implemented via pull-down 
menus from the graphical user interface (macOS and Windows only). In the jomo package, 
random level-1 covariance matrices are invoked by specifying the imputation method as 
“random”, whereas Blimp denotes a random association by joining two variables with a colon 
(e.g., “math:mathse’”) on the MODEL line. We refer readers to the software documentation files 
for additional details on syntax conventions (Enders et al., 2017; Quartagno & Carpenter, 
2016a). Table 2 gives the estimates from the two approaches. Consistent with the simulation 
results, the imputation methods produced similar fixed effect estimates but different variance 
components. In particular, fully conditional specification produced level-2 covariance matrix 
elements that were uniformly smaller than those of joint model imputation. Pooled likelihood 
ratio tests (Meng & Rubin, 1992) obtained from the mitml package (Grund, Robitzsch, & Liidke, 
2016) indicated that the slope variance (and covariance) were significant, although the joint 
model test statistic was somewhat larger in value at F(2,1176.28) = 13.60, p < .001 versus 
F(2,567.04) = 6.51, p = .002. Of course, it is difficult to form judgments from a single sample, 
but the analysis results highlight that the two imputation approaches can produce divergent 
estimates. 

Discussion 

Procedures for handling multilevel missing data have received less attention in the 
literature relative to single-level analysis problems. Existing research has largely focused on 
random intercept models, and literature addressing random slopes is particularly scant. To date, 
the few studies to examine missing data handling for these models have focused on the fully 


conditional specification framework and “reverse random coefficient” imputation (Enders et al., 
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2017; Grund, Ltidke, et al., 2016). Although it has not received much attention in the literature, a 
joint modeling strategy that uses random within-cluster covariance matrices to preserve cluster- 
specific associations (Yucel, 2011) is a promising alternative for random coefficient analyses. 
This approach was recently implemented in the R package jomo (Quartagno & Carpenter, 
2016a), and the only study to examine its performance did so in the context of individual patient 
data meta-analysis (Quartagno & Carpenter, 2016b). 

The primary goal of our manuscript was to compare fully conditional specification to 
joint model imputation with random level-1 covariance matrices. A vital consideration with 
multiple imputation is whether (and how) the imputation model preserves features of the 
analysis, a notion described as congeniality (Meng, 1994; Schafer, 2003) or compatibility 
(Bartlett et al., 2014; Carpenter & Kenward, 2013). Simulation results suggest that fully 
conditional specification can produce negatively biased slope variance estimates (Enders et al., 
2017; Grund, Ltidke, et al., 2016), and we show that an incorrect distributional assumption is 
responsible for this issue; the reverse regression model’s random effects cannot be normally 
distributed if the analysis model’s random effects are also normal (a standard assumption in 
multilevel modeling applications). In a similar vein, we show that joint model imputation with 
random covariance matrices is potentially problematic because it fails to preserve covariation 
between the intercepts and slopes. 

Our simulation results convey a number of take-home messages to substantive 
researchers. First, both imputation procedures generally produced tolerable biases when the 
missing data rate was 10% or less and the sample was comprised of at least 30 clusters with 15 
observations per group. These favorable conditions may encompass a wide range of multilevel 


modeling applications. Second, in terms of overall accuracy, fully conditional specification 
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appears to be superior when the intraclass correlation is low (e.g., ICC = .10), whereas joint 
model imputation is somewhat better when the intraclass correlation is high (e.g., ICC = .50). 
Roughly speaking, this suggests that fully conditional specification is preferable for cross- 
sectional data, and the joint model is better for within-subjects designs. Third, the missing data 
pattern is an important determinant of imputation quality. Problems with fully conditional 
specification are generally restricted to incomplete predictors with random slopes, and the 
incompatibilities that we described earlier in the paper would not occur with incomplete 
outcomes or fixed predictors. On the other hand, there is no reason to expect joint model 
imputation to improve if y is missing and x is complete. 

Perhaps the most important take-home message from our study is that neither fully 
conditional specification nor joint model imputation are optimal for random slope analyses with 
incomplete predictors. Fully Bayesian (1i.e., model-based) imputation (Erler et al., 2016; Ibrahim, 
Chen, & Lipsitz, 2002; Zhang & Wang, 2017) and its close relative substantive model- 
compatible imputation (Bartlett, Seaman, White, & Carpenter, 2014; Enders, Du, & Keller, 
2018; Goldstein, Carpenter, & Browne, 2014) are promising alternatives that are starting to 
receive attention in the literature. Erler et al. (2016) and Grund, Liidke, and Robitzsch (2018) 
illustrate model-based Bayesian imputation with the JAGS software package (Plummer, 2016), 
and a substantive model-compatible variant of fully conditional specification was recently 
implemented in the Blimp application (Enders et al., 2018; Keller & Enders, 2018). Both 
procedures avoid bias by respecifying the conditional distribution of the incomplete covariates 
(e.g., the problematic reverse random coefficient model from Equation 11) into a composite 
function, the components of which are compatible with the analysis model. Limited simulation 


results suggest that these procedures can provide a rather dramatic improvement over 
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conventional imputation routines when the analysis model includes random coefficients and 
other types of interactive effects (Enders et al., 2018; Grund et al., 2018). For interested readers, 
the online supplemental material gives the Blimp code for applying substantive model- 
compatible imputation to a random slope analysis, and this material also gives plots displaying 
relative bias values from a subset of our simulation conditions. Additional analysis examples 
(e.g., for models with interactive terms) are available in the Blimp user guide (Keller & Enders, 
2018). 

Though our study addresses an important gap in the methodological literature, our 
simulations lack generalizability because we necessarily investigated a limited set of conditions. 
Future studies could investigate different combinations of level-1 and level-2 sample sizes, 
different effect sizes, and more complex models. The interaction of model complexity and 
sample size could be particularly important for understanding joint model imputation, given the 
influence of within-cluster sample size on imputation quality. Also, it could be important to 
investigate the impact of unequal group sizes on fully conditional specification, as the imputation 
model for level-2 variables (e.g., see Equation 12) does not directly adjust for the fact that cluster 
means are potentially derived from different numbers of level-1 units. Analytic and simulation 
results suggest that manifest cluster means provide good performance in most situations but can 
introduce slight biases with low intraclass correlations, few observations per cluster, and 
extremely unbalanced cluster sizes (Grund et al., 2017). A future release of Blimp will feature 
the option to invoke the latent cluster mean procedure described in Grund et al. (2017). Next, we 
limited this preliminary comparison to normally distributed variables. Nonnormal continuous 
variables are potentially problematic for multiple imputation (Yuan, Yang-Wallentin, & Bentler, 


2012; Yucel & Demirtas, 2010), so future studies should examine whether the two imputation 
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frameworks are differentially impacted by this common problem. Additionally, both approaches 
readily accommodate incomplete categorical variables; Blimp uses a probit regression 
framework (i.e., latent variable imputation) for ordinal or nominal variables, and jomo offers the 
same functionality for nominal variables (imputation for nominal variables can also 
accommodate ordered categories, albeit it with a richly parameterized model). Although both 
software programs use the same underlying latent variable definition of categorical variables 
(Albert & Chib, 1993; Carpenter & Kenward, 2013; Johnson & Albert, 1999), implementing this 
approach in the joint model framework could greatly accentuate model complexity-related 
issues, particularly with incomplete level-1 variables and random slopes. Finally, it is important 
to emphasize that we considered a rather conventional multilevel regression model. Further 
studies could examine the use of multiple imputation with other covariance structures or analytic 
frameworks (e.g., random intercept models, multilevel structural equation models, generalized 
estimating equations). 

In sum, relatively few studies have examined missing data handling for random 
coefficient models, and ours was the first to compare fully conditional specification to joint 
model imputation with random covariance matrices (Yucel, 2011). Although both approaches 
can introduce bias-inducing incompatibilities, they tend to do so in somewhat different 
situations. Nevertheless, the potential for problematic biases underscores the importance of 


promising new Bayesian imputation procedures (Enders et al., 2017, October; Erler et al., 2016). 
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Table 1 


Population Correlations for Computer Simulation 


y x w av\ av2 
hy 1.0 0.4 0.3 0.4 0 
2d 0.4 1.0 0:3 0.4 0 
3.w 0:3 0.3 1.0 0.0 0.4 
4. avi 0.4 0.4 0.0 1.0 0 


5. av2 0 0 0.4 0 1.0 
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Table 2 


Real-Data Analysis Results 


Fully Conditional Specification 


Effect Est. SE t df 2) 
Intercept 34.08 2.30 14.79 6324.43 0.00 
Self-Efficacy 186 034 549 751.07 0.00 
Teacher Exp. 0.58 0.14 4.11 8483.01 0.00 
Condition 3.00 1.33 2.25 318841 0.02 
Intercept Var. 29.05 
Covariance -5.75 
Slope Var. 2.40 
Residual Var. 89.92 
Joint Model with Random Covariance Matrices 
Effect Est. SE t df 2) 
Effect 33.97 2.38 14.26 1924.85 0.00 
Self-Efficacy 186 037 5.01 1818.39 0.00 
Teacher Exp. 0.60 0.14 430 5138.86 0.00 
Condition 2.78 1.30 2.15 11586.23 0.03 
Intercept Var. 40.29 
Covariance -9.74 
Slope Var. a4 
Residual Var. 89.00 
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Figure 1. Average relative bias values for design cells with ICC = .10 and 15 clusters. Relative 


bias is defined as the difference between an average estimate and the true value expressed as a 


proportion of the true value. The dashed lines represent bias values of + 0.10. 
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Figure 2. Average relative bias values for design cells with ICC = .10 and 30 clusters. Relative 


bias is defined as the difference between an average estimate and the true value expressed as a 


proportion of the true value. The dashed lines represent bias values of + 0.10. 
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proportion of the true value. The dashed lines represent bias values of + 0.10. 
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Figure 4. Average relative bias values for design cells with ICC = .50 and 30 clusters. Relative 


bias is defined as the difference between an average estimate and the true value expressed as a 


proportion of the true value. The dashed lines represent bias values of + 0.10. 
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Supplement I: Blimp Syntax for Substantive Model-Compatible Imputation 
The syntax below applies substantive model-compatible imputation from Enders et al. (2018) to 
an artificial data set from the simulations reported in the body of the paper. Additional 


information about Blimp code is available in the user’s guide (Keller & Enders, 2018). 


DATA: ~/desktop/data.csv; 
VARNAMES: id al a2 yx w; 

! specify random slope with y:x; 
MODEL: id ~ al a2 y:x w; 

! specify outcome variable from the substantive analysis; 
OUTCOME: y; 

BURN: 2500; 

THIN: 2500; 

NIMPS: 20; 

MISSING: 999; 

SEED: 90291; 

OUTFILE: ~/desktop/imp*.csv; 


OPTIONS: separate; 
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Supplement J: Simulation Results Evaluating Substantive Model-Compatible Imputation 
To provide a preliminary evaluation of substantive model-compatible fully conditional 
specification (SMC-FCS), we applied the procedure to the subset of simulation conditions with 
an ICC = .50 and correlated random effects. The trellis plots display relative bias values for this 
new procedure as well as the conventional imputation approaches evaluated in the manuscript. 


Additional details on SMC-FCS can be found in Enders et al. (2018). 
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Supplemental Figure 1. Average relative bias values for design cells with ICC = .50 and 15 
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clusters. Relative bias is defined as the difference between an average estimate and the true value 


expressed as a proportion of the true value. The dashed lines represent bias values of + 0.10. 
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Supplemental Figure 2. Average relative bias values for design cells with ICC = .50 and 30 
clusters. Relative bias is defined as the difference between an average estimate and the true value 
expressed as a proportion of the true value. The dashed lines represent bias values of + 0.10. 
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Supplemental Figure 3. Average relative bias values for design cells with ICC = .50 and 100 
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clusters. Relative bias is defined as the difference between an average estimate and the true value 


expressed as a proportion of the true value. The dashed lines represent bias values of + 0.10. 
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